Overview

Dataset statistics

Number of variables9
Number of observations45312
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.4 MiB
Average record size in memory124.2 B

Variable types

NUM8
CAT1

Reproduction

Analysis started2020-03-22 12:10:16.512411
Analysis finished2020-03-22 12:10:51.793781
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
vicprice is highly skewed (γ1 = 78.68700693) Skewed
period has 944 (2.1%) zeros Zeros

Variables

date
Real number (ℝ≥0)

Distinct count933
Unique (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4990795572
Minimum0
Maximum1
Zeros48
Zeros (%)0.1%
Memory size354.1 KiB

Quantile statistics

Minimum0
5-th percentile0.005133
Q10.031934
median0.456329
Q30.88054725
95-th percentile0.907969
Maximum1
Range1
Interquartile range (IQR)0.84861325

Descriptive statistics

Standard deviation0.3403077218
Coefficient of variation (CV)0.6818706895
Kurtosis-1.318137405
Mean0.4990795572
Median Absolute Deviation (MAD)0.2825371897
Skewness-0.1658145258
Sum22614.2929
Variance0.1158093455
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 2.200000e-05 1.040000e-03 4.181000e-03 5.420000e-03 ... 9.156450e-01 9.157780e-01 9.711075e-01 9.932305e-01 1.000000e+00], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.898102 96 0.2%
 
0.875979 96 0.2%
 
0.442326 96 0.2%
 
0.871731 96 0.2%
 
0.473563 96 0.2%
 
0.876156 96 0.2%
 
0.867307 96 0.2%
 
0.880404 96 0.2%
 
0.893677 96 0.2%
 
0.889253 96 0.2%
 
Other values (923) 44352 97.9%
 
ValueCountFrequency (%) 
0 48 0.1%
 
4.4e-05 48 0.1%
 
8.8e-05 48 0.1%
 
0.000133 48 0.1%
 
0.000177 48 0.1%
 
ValueCountFrequency (%) 
1 48 0.1%
 
0.995575 48 0.1%
 
0.995443 48 0.1%
 
0.991018 48 0.1%
 
0.986594 48 0.1%
 

day
Real number (ℝ≥0)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.003177966
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size354.1 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.998694937
Coefficient of variation (CV)0.4992770629
Kurtosis-1.248878713
Mean4.003177966
Median Absolute Deviation (MAD)1.713374838
Skewness-0.001187819335
Sum181392
Variance3.994781452
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1. 1.5 6.5 7. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
7 6480 14.3%
 
6 6480 14.3%
 
5 6480 14.3%
 
4 6480 14.3%
 
3 6480 14.3%
 
2 6480 14.3%
 
1 6432 14.2%
 
ValueCountFrequency (%) 
1 6432 14.2%
 
2 6480 14.3%
 
3 6480 14.3%
 
4 6480 14.3%
 
5 6480 14.3%
 
ValueCountFrequency (%) 
7 6480 14.3%
 
6 6480 14.3%
 
5 6480 14.3%
 
4 6480 14.3%
 
3 6480 14.3%
 

period
Real number (ℝ≥0)

ZEROS
Distinct count48
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5
Minimum0
Maximum1
Zeros944
Zeros (%)2.1%
Memory size354.1 KiB

Quantile statistics

Minimum0
5-th percentile0.042553
Q10.25
median0.5
Q30.75
95-th percentile0.957447
Maximum1
Range1
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.2947563828
Coefficient of variation (CV)0.5895127656
Kurtosis-1.201042076
Mean0.5
Median Absolute Deviation (MAD)0.2553190833
Skewness8.506244433e-17
Sum22656
Variance0.0868813252
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.0106385 0.9893615 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.638298 944 2.1%
 
0.276596 944 2.1%
 
0.553191 944 2.1%
 
0.085106 944 2.1%
 
0.382979 944 2.1%
 
0.595745 944 2.1%
 
0.680851 944 2.1%
 
0.489362 944 2.1%
 
0.617021 944 2.1%
 
0.808511 944 2.1%
 
Other values (38) 35872 79.2%
 
ValueCountFrequency (%) 
0 944 2.1%
 
0.021277 944 2.1%
 
0.042553 944 2.1%
 
0.06383 944 2.1%
 
0.085106 944 2.1%
 
ValueCountFrequency (%) 
1 944 2.1%
 
0.978723 944 2.1%
 
0.957447 944 2.1%
 
0.93617 944 2.1%
 
0.914894 944 2.1%
 

nswprice
Real number (ℝ≥0)

Distinct count4089
Unique (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05786831014
Minimum0
Maximum1
Zeros29
Zeros (%)0.1%
Memory size354.1 KiB

Quantile statistics

Minimum0
5-th percentile0.02681
Q10.035127
median0.048652
Q30.074336
95-th percentile0.106611
Maximum1
Range1
Interquartile range (IQR)0.039209

Descriptive statistics

Standard deviation0.03999076895
Coefficient of variation (CV)0.69106509
Kurtosis158.5274187
Mean0.05786831014
Median Absolute Deviation (MAD)0.02328855723
Skewness9.073261555
Sum2622.128869
Variance0.001599261602
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 3.000000e-05 1.022300e-02 1.258000e-02 1.343500e-02 ... 1.792210e-01 1.796565e-01 2.311010e-01 3.982825e-01 1.000000e+00], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.074817 383 0.8%
 
0.055242 272 0.6%
 
0.058394 266 0.6%
 
0.028942 219 0.5%
 
0.041732 200 0.4%
 
0.041431 175 0.4%
 
0.028822 152 0.3%
 
0.029422 151 0.3%
 
0.044134 150 0.3%
 
0.041642 149 0.3%
 
Other values (4079) 43195 95.3%
 
ValueCountFrequency (%) 
0 29 0.1%
 
6e-05 1 < 0.1%
 
0.000901 2 < 0.1%
 
0.001831 1 < 0.1%
 
0.001861 1 < 0.1%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
0.981806 2 < 0.1%
 
0.979975 1 < 0.1%
 
0.970908 1 < 0.1%
 
0.968296 1 < 0.1%
 

nswdemand
Real number (ℝ≥0)

Distinct count5266
Unique (%)11.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4254178953
Minimum0
Maximum1
Zeros1
Zeros (%)< 0.1%
Memory size354.1 KiB

Quantile statistics

Minimum0
5-th percentile0.143707
Q10.309134
median0.4436925
Q30.536001
95-th percentile0.684023
Maximum1
Range1
Interquartile range (IQR)0.226867

Descriptive statistics

Standard deviation0.1633227142
Coefficient of variation (CV)0.3839112459
Kurtosis-0.4111293582
Mean0.4254178953
Median Absolute Deviation (MAD)0.1328403352
Skewness-0.09773906855
Sum19276.53567
Variance0.02667430897
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.036522 0.053481 0.0823415 0.0978135 ... 0.777819 0.820217 0.881583 0.9283695 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.488099 33 0.1%
 
0.478578 32 0.1%
 
0.487057 31 0.1%
 
0.451056 30 0.1%
 
0.489586 30 0.1%
 
0.480512 29 0.1%
 
0.517852 29 0.1%
 
0.4759 29 0.1%
 
0.473222 29 0.1%
 
0.50967 28 0.1%
 
Other values (5256) 45012 99.3%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
0.00119 1 < 0.1%
 
0.001488 1 < 0.1%
 
0.003868 1 < 0.1%
 
0.008182 1 < 0.1%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
0.980809 1 < 0.1%
 
0.962511 1 < 0.1%
 
0.960875 1 < 0.1%
 
0.956561 1 < 0.1%
 

vicprice
Real number (ℝ≥0)

SKEWED
Distinct count3798
Unique (%)8.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.003467033898
Minimum0
Maximum1
Zeros35
Zeros (%)0.1%
Memory size354.1 KiB

Quantile statistics

Minimum0
5-th percentile0.001525
Q10.002277
median0.003467
Q30.003467
95-th percentile0.00586
Maximum1
Range1
Interquartile range (IQR)0.00119

Descriptive statistics

Standard deviation0.01021303821
Coefficient of variation (CV)2.945756665
Kurtosis7047.774288
Mean0.003467033898
Median Absolute Deviation (MAD)0.001110384583
Skewness78.68700693
Sum157.09824
Variance0.0001043061495
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 2.90000e-05 5.09000e-04 5.75500e-04 6.59500e-04 ... 1.34085e-02 1.93275e-02 3.03330e-02 6.63650e-02 1.00000e+00], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.003467 17427 38.5%
 
0.000841 223 0.5%
 
0.002009 130 0.3%
 
0.000839 90 0.2%
 
0.000976 89 0.2%
 
0.000837 78 0.2%
 
0.001973 59 0.1%
 
0.001853 54 0.1%
 
0.001793 53 0.1%
 
0.001959 52 0.1%
 
Other values (3788) 27057 59.7%
 
ValueCountFrequency (%) 
0 35 0.1%
 
5.8e-05 3 < 0.1%
 
0.000118 3 < 0.1%
 
0.000129 2 < 0.1%
 
0.000139 1 < 0.1%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
0.996228 1 < 0.1%
 
0.980682 1 < 0.1%
 
0.837239 1 < 0.1%
 
0.509112 1 < 0.1%
 

vicdemand
Real number (ℝ≥0)

Distinct count2846
Unique (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4229150757
Minimum0
Maximum1
Zeros1
Zeros (%)< 0.1%
Memory size354.1 KiB

Quantile statistics

Minimum0
5-th percentile0.210772
Q10.372346
median0.422915
Q30.46925175
95-th percentile0.641119
Maximum1
Range1
Interquartile range (IQR)0.09690575

Descriptive statistics

Standard deviation0.1209653455
Coefficient of variation (CV)0.2860275087
Kurtosis0.9089617644
Mean0.4229150757
Median Absolute Deviation (MAD)0.07964014647
Skewness0.1595328979
Sum19163.12791
Variance0.01463261481
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.02913 0.0789745 0.100207 0.1484985 ... 0.7580265 0.7971255 0.841792 0.8907305 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.422915 17425 38.5%
 
0.562144 28 0.1%
 
0.494562 28 0.1%
 
0.542206 28 0.1%
 
0.333506 28 0.1%
 
0.328068 28 0.1%
 
0.51709 27 0.1%
 
0.340497 27 0.1%
 
0.506732 26 0.1%
 
0.287675 26 0.1%
 
Other values (2836) 27641 61.0%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
0.024081 1 < 0.1%
 
0.025634 1 < 0.1%
 
0.028742 1 < 0.1%
 
0.029518 1 < 0.1%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
0.996893 1 < 0.1%
 
0.991196 1 < 0.1%
 
0.989384 1 < 0.1%
 
0.983687 1 < 0.1%
 

transfer
Real number (ℝ≥0)

Distinct count1878
Unique (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5005263909
Minimum0
Maximum1
Zeros1
Zeros (%)< 0.1%
Memory size354.1 KiB

Quantile statistics

Minimum0
5-th percentile0.29760945
Q10.414912
median0.414912
Q30.605702
95-th percentile0.801754
Maximum1
Range1
Interquartile range (IQR)0.19079

Descriptive statistics

Standard deviation0.1533733911
Coefficient of variation (CV)0.3064241845
Kurtosis-0.08657400985
Mean0.5005263909
Median Absolute Deviation (MAD)0.1263068164
Skewness0.6705411339
Sum22679.85183
Variance0.02352339711
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.074781 0.1217105 0.16557 0.19057 ... 0.902851 0.9039475 0.9282895 0.9458335 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.414912 17480 38.6%
 
0.500526 338 0.7%
 
0.765789 220 0.5%
 
0.838158 75 0.2%
 
0.634211 68 0.2%
 
0.67807 58 0.1%
 
0.841667 57 0.1%
 
0.662719 56 0.1%
 
0.857018 56 0.1%
 
0.72193 54 0.1%
 
Other values (1868) 26850 59.3%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
0.000439 1 < 0.1%
 
0.002632 1 < 0.1%
 
0.004386 1 < 0.1%
 
0.00614 1 < 0.1%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
0.959211 1 < 0.1%
 
0.957895 1 < 0.1%
 
0.946053 1 < 0.1%
 
0.945614 1 < 0.1%
 

class
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size354.1 KiB
DOWN
26075
UP
19237
ValueCountFrequency (%) 
DOWN 26075 57.5%
 
UP 19237 42.5%
 

Length

Max length4
Mean length3.150909251
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 6 100.0%
 
ValueCountFrequency (%) 
Latin 6 100.0%
 
ValueCountFrequency (%) 
ASCII 6 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

datedayperiodnswpricenswdemandvicpricevicdemandtransferclass
00.020.0000000.0564430.4391550.0034670.4229150.414912UP
10.020.0212770.0516990.4150550.0034670.4229150.414912UP
20.020.0425530.0514890.3850040.0034670.4229150.414912UP
30.020.0638300.0454850.3146390.0034670.4229150.414912UP
40.020.0851060.0424820.2511160.0034670.4229150.414912DOWN
50.020.1063830.0411610.2075280.0034670.4229150.414912DOWN
60.020.1276600.0411610.1718240.0034670.4229150.414912DOWN
70.020.1489360.0411610.1527820.0034670.4229150.414912DOWN
80.020.1702130.0411610.1349300.0034670.4229150.414912DOWN
90.020.1914890.0411610.1405830.0034670.4229150.414912DOWN

Last rows

datedayperiodnswpricenswdemandvicpricevicdemandtransferclass
453020.915870.8085110.0550620.3501930.0037720.2959610.365351DOWN
453030.915870.8297870.0654200.3539130.0045080.3195240.319737UP
453040.915870.8510640.0559020.3400770.0038570.3135680.375000DOWN
453050.915870.8723400.0506480.3228210.0034880.3055410.325877DOWN
453060.915870.8936170.0588750.3402260.0040490.2765410.351754DOWN
453070.915870.9148940.0442240.3406720.0030330.2550490.405263DOWN
453080.915870.9361700.0448840.3555490.0030720.2413260.420614DOWN
453090.915870.9574470.0435930.3409700.0029830.2477990.362281DOWN
453100.915870.9787230.0666510.3293660.0046300.3454170.206579UP
453110.915871.0000000.0506790.2887530.0035420.3552560.231140DOWN